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REGULATORY ELEMENT FOR EXPRESSING GENES IN PLANTS 



Field of the Invention 

The present invention is directed to nucleic acid sequences that control 
5 the expression of genes in eukaryotic cells. More particularly, the invention is directed 
to a gene promoter that confers a high level of expression to genes that are operably 
linked to the promoter. 

Background and Summary of the Invention 

10 The present invention relates to a novel regulatory element which 

confers a high level of expression in plant cells to genes that are operably linked to the 
regulatory element. The ability to control the level of gene expression in plants is 
important for many applications of genetic transformation procedures including those 
directed to crop improvement. 

15 In eukaryotic organisms, multi-level regulatory systems exist to control 

gene expression. The transcription process is an integral part of such systems and is 
involved in synthesis of mRNA molecules. The efficiency of transcription is mostly 
determined by a region of DNA called the promoter. The promoter consists of gene 
sequences upstream of the site of transcription initiation. The components of the 

20 promoter region include the "TATA" box and often a "CAAT" box. In addition, many 
other regulatory elements that affect transcription may be present in the promoter 
sequences. The coordinated action of cellular proteins (transcription factors) 
interacting with promoter sequences determines the specificity of a particular promoter 
and its effectiveness. Since most eukaryotic genes are stringently regulated, there is a 

25 limited availability of promoters with constitutive, strong expression. 

The present invention describes the isolation and purification of a DNA 
sequence that expresses operably linked genes to high levels in plant cells. The 
promoter sequence described in the present invention expresses genes at a level equal 
to or higher than that obtained from one of the strongest presently available promoters 

30 - the 35S cauliflower mosaic virus promoter. Such promoters are needed to direct a 
high level of protein expression in transgenic plants. The strong promoter of the 
present invention is used to construct expression vectors for expressing genes in plant 
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cells. In one embodiment, a plant expression vector is provided that comprises the 
regulatory element of SEQ ID NO: 2 operably linked to a non-natively associated 
gene, and this vector is used to produce transgenic plants. 

5 Brief Description of the Drawings 

Fig. 1 Represents a restriction map of the 4.8 kb HindSl Arabidopsis 
genomic fragment that hybridizes to the Bgti RTS-1 gene fragment. 

Fig. 2 Expression of gusk in Arabidopsis protoplasts when gusA is 
operably linked to: the 3 5S cauliflower mosaic virus promoter (p35GUS), the 
10 promoter of SEQ ID NO: 2 (pUN-GUS), the promoter of SEQ ID NO: 3 (pASR- 
GUS) or lacking a promoter (DNA-). 



Detailed Description of the Invention 
Definitions 

15 Unless specified otherwise, any reference to DNA, a DNA sequence, 

promoter, or regulatory sequence is a reference to a double stranded DNA sequence. 
A promoter is a DNA sequence that directs the transcription of a structural gene. 
Typically, a promoter is located in the 5' region of a gene, proximal to the transcription 
start site of a structural gene. If a promoter is an inducible promoter, then the rate of 

20 transcription increases in response to an inducing agent. In contrast, if the promoter is 
a constitutive promoter, then the rate of transcription is not regulated by an inducing 
agent. 

An enhancer is a DNA regulatory element that can increase the 
efficiency of transcription, regardless of the distance or orientation of the enhancer 
25 relative to the start site of transcription. 

The term "expression" refers to the biosynthesis of a gene product. For 
example, in the case of a structural gene, expression involves the transcription of the 
structural gene into messenger RNA and the translation of messenger RNA into one or 
more polypeptides. 

30 An expression vector is a DNA molecule comprising the regulatory 

elements necessary for transcription of a gene in a host cell. Typically the gene is 
placed under the control of certain regulatory elements including constitutive or 
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inducible promoters, tissue-specific regulatory elements, and enhancer elements. Such 
a gene is said to be "operably linked to" the regulatory elements when the regulating 
element controls the expression of the gene. Expression vectors typically include 
eukaryotic and/or bacterial selectable markers that allow for selection of cells 
5 containing the expression vector. 

An exogenous DNA sequence refers to a DNA sequence that has been 
introduced into a host cell from an external source. A transgenic plant is a plant 
having one or more plant cells that contain an exogenous DNA sequence. The term 
stably transformed refers to a transformed cell or plant that is capable of transmitting 
10 an exogenous DNA sequence to its progeny. Typically a stably transformed host has 
the exogenous DNA sequence integrated into its genome. 

A core promoter contains the essential nucleotide sequences for 
promoter function, including the TATA box and start of transcription. By this 
definition, a core promoter may or may not have detectible activity in the absence of 
15 specific sequences (regulatory elements) that may enhance the activity of the core 
promoter or confer tissue specific activity. 

A visible marker is defined herein as including any gene that encodes a 
product that produces a phenotypic trait to the host cell or organism. 

A selectable marker is defined herein as including any nucleic acid 
20 sequence or gene product that can be selected for after introduction into a cell. The 
selectable marker facilitates the identification of transformants. 

A polylinker is a DNA sequence that contains multiple endonuclease 
restriction enzyme identification sequences in close proximity of one another. 

The present invention is directed to a substantially purified genomic 
25 DNA sequence isolated from Arabidopsis thaliana (SEQ ID NO: 1). The genomic 
DNA encodes for two proteins (ASR-2 and ORF 3) and contains a dual promoter 
region located between those two genes that drives the expression of both genes (see 
Fig. 1). 

The genomic region containing the coding DNA sequence for ASR-2, 
30 located between nucleotides 945 to 3694 of SEQ ID NO: 1, encompasses sequences 
that are homologous to a human pre-mRNA splicing factor ASF/SF2 and the 
Arabidopsis SRI gene. The alignment of the ASR-2 genomic DNA sequences with 
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the SRI cDNA sequence indicated the presence of eleven putative exons in the ASR-2 
gene, and the deduced amino "acid sequence has 82% identity (92% similarity) with the 
deduced amino acid sequence of SRI. The sequence identity of ASR-2 with the 
human splicing factor SF2 was 62% as compared to 59% identity between the SRI 
5 and SF2 genes. The ASR-2 gene also appears to have an identical structural 

organization of RNA-binding domains, the glycine spacer, and the SR domain as is 
observed in the SRI and SR2 genes. The ASR-2 coding sequence also includes a 
highly charged PSK domain at the C-terminal end similar to the SRI gene but absent in 
the ASF/SF-2 gene coding sequences. 

10 The regulatory elements controlling the expression of the ASR-2 gene 

are contained within the 530 nucleotide region shown as SEQ ID NO: 3. The 
expression of the ASR-2 gene was analyzed by reverse transcription PCR in different 
parts of Arabidopsis plant. The ASR-2 gene was found to be expressed in all plant 
parts investigated including the leaves, stems, siliques, and roots. Similar levels of 

15 expression were observed in different plant organs. The experiment revealed the 
presence of more than one transcript hybridizing to the ASR-1 probe (the 2.4 kb 
EcdRI fragment of ASR-1 genomic clone), and could represent splice variants of 
ASR-2 transcripts. 

The 4.8 kb Hindlll genomic fragment encodes for another gene, ORF 3 

20 that is located on the complimentary strand relative to the sequence encoding the ASR- 
2 gene (see Fig. 1), between nucleotides 4217 and 4917 of SEQ ID NO: 1. A 530 bp 
region is located between the ASR-2 and ORF 3 genes (at position 3691-4220 of SEQ 
ID NO: 1) and that 530 bp region functions as a dual promoter for expressing both 
ASR-2 and ORF 3. The sequence of the DNA region that contains the regulatory 

25 elements for expressing ORF 3 is shown as SEQ ID NO: 2. 

SEQ ED NO: 2 and SEQ ID NO: 3 are inverse compliments of each 
other, and accordingly a double stranded DNA sequences that contains SEQ ID NO: 2 
will also contain SEQ ID NO: 3. As used herein with reference to double stranded 
DNA sequences, SEQ ID NO: 2 and SEQ ID NO: 3 will designate the orientation of 

30 the 530 bp region in DNA constructs. If the 530 bp region is ligated to a gene through 
its 3' end (as shown in SEQ ID NO: 1), the sequence will be referred to as SEQ ID 
NO: 2, and if the 530 bp region is ligated to a gene through its 5' end, the sequence 
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will be referred to as SEQ ID NO: 3. For example, a gene operably linked to a 
promoter comprising the sequence of SEQ ID NO: 2 designates that the promoter is 
operably linked to that gene in the orientation naturally expresses the ORF 3 gene. 

The 530 bp region located between the ASR-2 and the ORF 3 genes 
5 promoter region contains sequences that are known to bind proteins that are involved 
in the transcriptional process and can function in either direction. Accordingly, this 
sequence can be used in either orientation as a promoter for expressing genes in 
eukaryotic cells and more particularly in plant cells. The present invention is directed 
to a substantially pure DNA sequence comprising the sequence of SEQ ID NO: 2, and 

10 the use of such a sequence to express exogenous genes in plants. 

In accordance with one embodiment, a recombinant expression vector 
is prepared comprising a promoter having a consecutive 20 base pair sequence 
identical to the sequence of SEQ ID NO: 2 or SEQ ID NO: 3. Typically the 
expression vector will also include a polylinker region located adjacent to the promoter 

15 such that upon insertion of a gene sequence into the polylinker, the gene will be 

operably linked to the promoter. In one embodiment the promoter utilized is the DNA 
sequence of SEQ ID NO: 2. The expression vector typically includes a eukaryotic 
selectable marker gene or a visible marker gene to allow identification of plant cells 
transformed with the exogenous DNA sequence. In one embodiment the expression 

20 vector further includes a prokaiyotic selectable marker gene and a prokaryotic origin 
of replication that allow for the transformation and reproduction of the expression 
vector in prokaryotes. 

In accordance with the present invention, a DNA construct comprising 
the regulatory element of SEQ ID NO: 2, a core promoter and a gene operably linked 

25 to the core promoter is used to transform a plant cell, using procedures known to 

those familiar with the art. Such transformation procedures include but are not limited 
to microinjection, microprojectile bombardment, electroporation, calcium chloride 
permeabilization, polyethylene glycol permeabilization, protoplast fusion or bacterial 
mediated mechanisms such as Agrobacterivm tumefaciens or Agrobacterium 

30 rhizogenes. 

Transformed cells (those containing the DNA inserted into the host 
cell's DNA) are selected from untransformed cells through the use of a selectable 
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marker included as part of the introduced DNA sequences. Transformed cells/plant 
entities can also be identified by the expression of a visible marker included as part of 
the introduced DNA sequences. Visible markers include genes that impart a visible 
phenotypic trait such as seed color (i.e., yellow, purple or white genes) or shape (i.e., 
S shrunken or plump genes). Selectable markers include genes that provide antibiotic 
resistance or herbicide resistance. Cells containing selectable marker genes are capable 
of surviving in the presence of antibiotic or herbicide concentrations that kill 
untransformed cells. Examples of selectable marker genes include the bar gene which 
provides resistance to the herbicide Basta, the nptll gene which confers kanamycin 

10 resistance, and the hpt gene which confers hygromycin resistance. An entire plant can 
be generated from a single transformed plant cell through cell culturing techniques 
known to those skilled in the art. 

In one embodiment a transgenic plant entity is provided wherein the 
plant entity consists essentially of a plant cell, seed or plant produced from the in vitro 

15 introduction of an exogenous nucleic acid sequence into a plant cell, wherein the 

exogenous nucleic acid sequence encodes a gene whose expression is controlled by the 
regulatory elements of SEQ ID NO: 2. More particularly, the transgenic plant is 
generated by transforming a plant cell with a DNA vector comprising a promoter, 
having a consecutive 20 base pair sequence identical to the sequence of SEQ ID NO: 2 

20 operably linked to a gene. In one embodiment, the DNA vector used to transform the 
plant cell comprises the 520 bp sequence of SEQ ID NO: 2 operably linked to a gene. 
The gene may encode for any product that is beneficial to the plant (for example, gene 
products that directly or indirectly provide herbicide resistance, insecticidal resistance, 
fungal resistance or act as growth regulators) or may encode for pharmaceutical or 

25 polymer components that are subsequently purified from plant material for commercial 
use. The exogenous nucleic acid sequences used to produce the transgenic plant 
typically also include a selectable marker gene or a visible marker gene to allow 
identification of the cells transformed with the exogenous DNA sequence. In 
accordance with one embodiment, a plant expression vector comprising a regulatory 

30 element operably linked to a non-natively associated gene is used to produce a 

transgenic plant, wherein the regulatory element is selected from the sequence of SEQ 
IDNO:2. 
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The regulatory element of SEQ ID NO:2 has been demonstrated to be 
highly efficient in transcribing genes in Arabidopsis cells (see Example 2 for details). 
As shown in Fig. 2 the regulatory element of SEQ ID NO: 2 ligated to the gusA coding 
sequence induced GUS activity at the level of 1.72 nmol MU/hr/1,000 protoplasts, 
5 whereas the 35S Cauliflower Mosaic Virus promotor when operably linked to the 
gusA coding sequence produced GUS activity at 1 nmol MU/hr/1,000 protoplasts. 
The regulatory element of SEQ ID NO: 3 ligated to the gusA coding sequence 
exhibited low level of GUS activity in Arabidopsis. Accordingly the 530 bp region, as 
shown in SEQ ID NO: 2, functions as a strong promoter when operably linked to an 
10 exogenous gene in the orientation that naturally expresses the ORF 3 gene in 
Arabidopsis. 



Example 1 

Isolation of the Genomic Fragment Encoding SEQ ID NO: 1 

15 A genomic library of Arabidopsis thaliana ecotype RDL (prepared by 

ligation of Hindlll partially digested genomic DNA fragments, ranging between 8- 
23kb, into the Hindlll site of the binary cosmid pBIC20) was screened with a BgH 
fragment of the rice anther-specific cDNA clone RTS-1 (SEQ ID NO: 4) to isolate 
DNA fragments containing homologous sequences. 

20 The RTS-1 cDNA clone is a tapetum specific gene that encodes an 

alanine-rich protein that is expressed in tapetum cells of rice anthers. The gene is more 
fully described in PCT application serial no. PCT/US96/16418, published on April 17, 
1997 (publication no. WO97/13401), the disclosure of which is expressly incorporated 
herein. 

25 Library screening was performed in large Petri dishes (20x20 cm) 

containing approximately 20,000 recombinant colonies of K coli NM554 cells. Such 
density should represent about three Arabidopsis genome equivalents. The 
recombinant colonies were lifted on Hybond-N hybridization transfer membranes 
(Amersham) and membrane-bound DNA (UV irradiation) was probed with the BgH 

30 cDNA fragment of the RTS-1 gene (SEQ ID NO: 4), Membranes were prehybridized 
at 50°C for 1 hr in pre-hybridization solution containing 5xSSPE, 5x Denhardt's 
solution, 0.5% SDS, and 0.2 mg/ml denatured salmon-sperm DNA. Hybridization was 
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overnight at 50°C. The filters were washed twice at 50°C in 3 x SSC solution for 15 
min, once at 50°C for 15 min in 1 x SSC solution, and in 0.2 x SSC solution at 50°C 
for 15 min followed by 30 min incubation at room temperature. Washed filters were 
wrapped in SaranWrap and autoradiography was carried out overnight. 
5 Twenty-three independent clones hybridizing to the probe were 

identified and selected for endonuclease restriction analysis and Southern blotting. 
Most of the clones gave rise to multiple signals of varying intensity upon probing with 
the Bgtl cDNA fragment of the RTS-1 gene (SEQ ID NO: 4). The initial 
endonuclease restriction and Southern blot analysis identified genomic clone #2 as 

10 having a 4.8 kb Hindlll fragment that hybridizes to the RTS-1 probe. This clone was 
selected for further detailed analysis. When the 4.8 kb Hindlll fragment was 
subsequently restricted into the two 2.2 kb and 2.4 kb EcoRl fragments, both 
fragments hybridized to the probe indicating two independent probe binding sites. 

The 4.8 kb Hindlll DNA fragment was sequenced using standard 

15 techniques. For sequence analysis, the 2.2 and 2.4 EcoRl fragments internal to the 4.8 
kb Hindlll fragment (See Fig. 1) were subcloned into the pBluescript KS +/-vector. In 
addition, DNA fragments generated by digestion of the EcoRl fragments with Xbal 
were subcloned to facilitate the sequencing process. The sequence of the cross- 
hatched region shown in Fig. 1 is shown as SEQ ID NO: 1 

20 A simple homology search for sequences similar to the RTS-1 probe 

resulted in the identification of three possible binding sites within the 4.8 kb Hindlll 
fragment (indicated as boxes above the cross-hatched region of Fig. 1). The matching 
percentage was in the range of 35-39% over the 190 bp probe fragment. Experimental 
results on restriction fragment hybridization to the RTS-1 probe were in agreement 

25 with predicted positions of the probe binding sites. The longest open reading frame is 
located in one region of probe binding and it was selected for further analysis. 

A genomic fragment from position 825 to 3694 nucleotide SEQ ID 
NO: 1 was identified as containing sequences homologous to the human splicing factor 
ASF/SF2 and the Arabidopsis SRI gene, and that region was designated as the ASR-2 

30 region (See Fig. 1). The alignment of genomic DNA sequences of this gene with the 
SRI cDNA sequences indicated the presence of eleven putative exons with 82% 
identity (92% similarity) of the deduced amino acid sequences. The sequence identity 
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to the human splicing factor SF2 was 62% as compared to 59% identity between the 
SRI and SF2 genes. An identical structural organization of RNA-binding domains, the 
glycine spacer, and the SR domain was observed among all three genes. The ASR-2 
coding sequence also included a highly charged PSK domain at the C-terminal end 
5 similar to the SRI gene but absent in the ASF/SF-2 gene coding sequences. 

The presence of ASR-2 transcripts was analyzed by reverse 
transcription PCR in different parts of Arabidopsis plant. Total RNA was isolated 
from various Arabidopsis organs and reverse transcription of 5 of the total RNA, 
treated with RNase-free DNase, was performed with MuMLV-reverse transcriptase 

10 (400 units) and oligo dT 18 _ 22 primer (4 /ig) for 1 h at 42 °C followed by 5 min at 95 °C. 
Following incubation, the reaction mixture was treated with RNAse H (8 units) for 20 
min at 37 °C. Five /zl of the reverse transcription reaction was then amplified with 
Taq-polymerase (Perkin-Elmer Cetus) using primers that recognize the first exon in the 
RNA recognition domain and the SR domain of ASR-2. Primer sequences were 

15 selected that were specific to the ASR-2 domains but not to the SRI homologous 
domains. 

The ASR-2 gene was found to be expressed in all plant parts 
investigated including the leaves, stems, siliques, and roots. Similar levels of 
expression were observed in different plant organs. The experiment revealed the 

20 presence of more than one transcript hybridizing to the ASR-2 probe (the 2 .4 kb 

EcoRI fragment of ASR-2 genomic clone). Shorter transcripts were identified in RT- 
PCR reaction products than expected and could represent splice variants of ASR-2 
transcripts. Sequencing of the amplified major RT-PCR product confirmed all 
predicted intron-exon junction sites except the splicing sites (5' as well as 3 1 ) of the 

25 intron #7. Such transcripts contain the SR domain message but they cannot be 
translated into the full length protein because splicing of the intron #7 generates a 
frame shift mutation leading to a stop codon just after the splice site. 



Example 2 

30 Expression of gus A using the promoters of SEQ ID NO: 2 and SEQ ID NO: 3 

Arabidopsis protoplasts were transformed with bacterial vectors 
containing the 5* untranslated ASR-2 DNA sequence connected to the coding 
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sequences of the bacterial b-g!ucuronidase gene. The coding sequences were ligated 
to the 3* end and to the 5' end of the promoter sequence and the respective constructs 
were designated as pUN-GUS [having the promoter orientated in the direction that 
normally transcribes the ORF-3 gene (i.e., SEQ ID NO: 2) and operably linked to the 
5 gusA gene] and pASR-GUS [having the promoter orientated in the direction that 
normally transcribes the ASR-2 gene (i.e., SEQ ID NO: 3) and operably linked to the 
gusA gene]. The sequences of the junction site between the promoter sequence and 
the gusA coding sequence are disclosed as SEQ ID NO: 5 for pASR-GUS and SEQ ID 
NO: 6 for pUN-GUS wherein the ATG start codon is located at nucleotide 6 and the 

10 coding region of the gusA is located at nucleotide 30. 

The unique DNA promoter sequence discovered and claimed in the 
present invention is located between the nucleotides at the position 3690 through 
4221, of SEQ ED NO: 1. The sequence is 530 nucleotides in length and is presented as 
SEQ ID NO: 2 and SEQ ID NO: 3. The 530 bp sequence contains numerous 

15 transcription factor binding sites including two "TATA" boxes at positions 141,316 
and "CAAT" boxes located at 223, 386, 448, and 486 and one zeste element 
(GTGAGTG) at 264 of SEQ ID NO: 2. 

The activity of the claimed sequence driving the expression of a foreign 
gene, gus\ in plant cells was compared to the activity of the 35S cauliflower mosaic 

20 virus promotor in Arabidopsis protoplasts. Four expression vectors p35SGUS (having 
the 35S cauliflower mosaic virus promoter operably linked to the gusA gene), pUN- 
GUS (as described above), and pASR-GUS (as described above) and a control vector 
lacking a promoter operably linked to the gusA gene (DNA-), were introduced into the 
protoplasts by a PEG-mediated transformation procedure. One day after 

25 transformation, the GUS activity was determined. The 35S CaMV promotor 
controlling the gusA sequence produced GUS activity at 1 nmol MU/hr/1,000 
protoplasts, while the claimed sequence ligated to the gusA coding through its 3 1 end 
(SEQ ID NO: 2) induced GUS activity at the level of L72 nmol MU/hr/1,000 
protoplasts (see Fig. 2). The claimed sequence ligated to the giisA coding sequence in 

30 the opposite orientation (SEQ ID NO: 3) exhibited a low level of GUS activity in 
Arabidopsis. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Purdue Research Foundation, 
Hodges, Thomas K. 
Lysnik, Leszek A 

(ii) TITLE OF INVENTION: Regulatory Element For Expressing Genes 
In Plants 

(iii) NUMBER OF SEQUENCES: 6 

15 (iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Barnes & Thornburg 

(B) STREET: 11 S. Meridian 

(C) CITY: Indianapolis 

(D) STATE: Indiana 
20 (E) COUNTRY: USA 

(F) ZIP: 46204 



5 

10 



(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

25 (B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC - DOS /MS - DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 
30 (A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 
35 (A) NAME: Breen, John P. 

(B) REGISTRATION NUMBER: 38,833 

(C) REFERENCE /DOCKET NUMBER: 3220-29933 

(ix) TELECOMMUNICATION INFORMATION: 
40 (A) TELEPHONE: (317) 231-7745 

(B) TELEFAX: (317) 231-7433 



45 



55 



60 



65 



(2) INFORMATION FOR SEQ ID NO:l: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5285 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
50 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Arabidopsis thaliana 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

GAATTCCAGC GTGGAAGAGA CCAGGACAAC AAACAGCGAG TTTGTATAAA GAAGCCCAAC 60 

CACCGGGAGG AGTAAGAGAC GAATCCGCCG CGGTGGATGA AGAAGCGGAT GCGGCGGCGG 120 

AGGGAGGAAG AGGGAGGAGA TCGGAGATGA CGGTGGGGAT AGCGGAGTAG CACTGAGTTT 180 
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TACAACAAAC GTTTCGAACA ATTGAGGAGA 
AGTTTCTTGG ATCATCTTGG AGCTTGAAGA 
5 AAGAGAAGAC AGATTCCGCC ATGGATCGCC 
CAAATTCACT TATACTGTTA TGGGTCCGGT 
CTTAAGGTGG GTCACAGATT CACTAACACC 

10 

GCTCTCGAAA ATGACATCGT ACGGACTGAA 
GAGTTCAATG CACATGTCTT TTTAGGTTCA 
15 TCGAATTGTG AGCTTTTGAA TTAAGTTTGG 
TAAGCACTAA CTATATTAGC AAGTCATATA 
ATGACTGAGT ATATATCATT CCCATGTGCA 

20 

CTAGTTTTTC AACTTCTCAT ACATAAGAGA 
CCAGCTCTTT TACCTGATAT GAGATGTTTC 
25 AGAGAAAGAG TGAGAATGTT CCAAGCAAGT 
CCTGCTTCTG CTCTTGCTCC TACCACGGAT 
TGCTTCCTTC TGAACCTGCA ACACACACAA 

30 

TAGACATAGA TCATGTCTTT TGGATAGTAT 
GTATAGCGGA GTAAAAAGTG TATTAGAGAT 
35 GACCTTGAAC GAGATCTGCA AAAGTCCGAG 
TCCGTGTTTA ATAGCTGTTT CACCAAGACC 
AACACAATCA CAAGAAGCCA AAAAGGAAAA 

40 

ATCAAACCAA CACAAGCCTG CTTTTTGTTA 
GCAAATCTAA TTTTATCAAA CCCTGTGTCC 
45 TCGCTTGTCG ATTAGCAGGT AATGAAAATA 
GACACTATAT CAATGCGATC CAATGTCTCT 
TTATCAGAAA ACGAACTTGC CCATTTAAGA 

50 

GTGTACAACC AACCAAGACA TTCCATGATC 
GTATGTATAT CCATATCTAA GAAACAAAAC 
55 GACTTTCAAA GGAAATAAAT CGTGTACCCT 
CAACTCGGAG TCATCCCAGT GACATCTTTT 
GCTAAATAGA GCCTATGTTA TGTTTTATGT 

60 

AATGAAGCAT ACCCTCGTGG AGACAGTGAC 
GATCTCGATG TAGATTTTGC AGGCGATCTA 
65 TTGCTCCTGC TTCTGCTGCG GCTACGACTA 
TTAGAATAGG ATCTTCCACG GCTGGGGCTC 
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ATTTGGTGCA GACGCAAGCG AGACGAGCCC 240 
AGATGTTGAA GACTACGTCT TCTGGTATAC 300 
TTCTTCTCTC TAGGCGGCTC CTTTCTTATC 360 
ACGCGTAAAC CGGGAATAGT CTTAACTGTT 420 
CACACAAAGG CAAGTAAGTA ATGCGCAACA 480 
CTAAAATGTA AAGGGTCCGG GTATCAAAAT 540 
TTTATTGTGA ACGTTTTCAA AATTTTAATA 600 
TATTCGACAG TAATTTTTGA TAGTTCGTTT 660 
AATCAGCTGA GCTTAGCTCA TAAACTGATG 720 
AACCCAAGCT AATAAGAATG AAATACACAA 780 
GATCATCTTT ATGAGAATCT TCCAACAGAC 840 
TTTACCTGTA AAACAGTTGG AAAGAGATTC 900 
TAAAAAGAGT GTGCATTACC GAGATGGACT 960 
AGGGCTCAGC TGCTTGCTAG GGCTCTTACT 1020 
TTCAATCCAG ATGATGAGAA TGTAATTAGC 1080 
GGATTGAAAA CCGAAAATAT TGTGCTGATA 1140 
AGATGACTTA GAGAGGGTAG AGGAGATCTT 1200 
AAACAAATCC AGTTTTAAAA ATCCAATATT 1260 
CTTGATGCCT CATTCCATGT CAGACATCAA 1320 
AACATATCTT CAGACTATTT TTAGCAGAAA 1380 
AAACAGATGG TAGAGATGGA GAATATCAAA 1440 
AGAACAGCAT CGGTTCCATG AGAATCCAGA 1500 
TCTGTCCTAG CTCGATGGGT CCATTTTGAT 1560 
CCACTGTTAC CCATTTAGCA GCAATGGATA 1620 
AAGAGCATAT ACCTTTGAGA CGAAACAGTT 1680 
CAGATACTAT TTCCCATATT TTAGTTGATT 1740 
CATTCTCAAC ACTATAATTA TAAAAGACCA 1800 
TAACTAGAAG CAATCATCAT TCTCAATCAA 1860 
AATGTGATGT CACCAAACTT CAAGGGAAGA 1920 
TGGATATTTT AGCATAAACA TTATAGAAGA 1980 
CTCGACTTAG AGCGGGAGCG AGAGCGAGGA 2040 
CGCAAAGATT TAGCCTTTGG ACTTCTGCTC 2100 
CGGCTGGGAC TCCGTCCACG GCTGCGGCTC 2160 
CTCGAATCCC TCCTTGAATC ATATTCTCTA 2220 
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ACCTGTAATG ACATAGGGAA ATGTTAAGTG 
TTGTCTTTAA CATACCCGAA CATATTCATG 
5 GAGCTTTTTT TATCTGGACA AAATAAAGAA 
TTCTCAGCCA CAGTCCCACA CAGAAAATCA 
CCGCATATTT CATGTCCTCG TAGCTGGTAT 

10 

AATAGTCCAC GATATAGATT TTTTAAAGCA 
GCAGAAAACC CAAACAGCTT TAGACACATC 
15 CTCTACCATC ACGAAACACT TGAGAAAAAC 
TCAAAACAAA GTGATATGTC AACATTCAGA 
GAATCTGTCC ACAACTGTGT AATCACCTTG 

20 

CCTGACACTA CAACTGTATA GTGGAAAATC 
AATACACTAA AGCTAAACAT ATCATATGTA 
25 CGTTCACGAC CACCACCGTC ACCACCACCA 
CCGCGTGCAT CATGTGATGA ACGCCTCCCA 
CACACATAAA TTATGTTTGG CTACCAGTCA 

30 

ATTTGATGAT TAACAAACCC GTAAATGATG 
AATTGCATCA TCAGCATCAC GAGCATCCTC 
35 GTGGTAGTAT CTCCCAACAT TTGAAAACTC 
CTGTAAAATC AACAACAAGC CTTTATAAAA 
CGAATGCATA GCCTGGAGGC CTCGGCGGAA 

40 

ACTTCAAAAA AAATAAGGAA GAACAATTCT 
TCCAACAATT CCGAAGATGC ATTCAATTTC 
45 CATCACATAA CACAAAATTC GATCCTGAGT 
ATTTACCTTA CTGAACAAGT CTTCAACTTC 
GACGTAAATC GTTCTACTCG AACGGCTGCT 

50 

ATTGAGATTC GAAACGTCAA TAGATCGAAA 
ATTGGATAAC GATTAACCTA AGGAAAACTA 
55 CTAAAATTCC AGTAACGATT CCGATCACCT 
AAGGCGAAAA TTGAAATCTG ACTAGGGCTT 
ACGTGACCGA CCGGGTACGT ATTAAAATAC 

60 

ACCCGTTTTG CAACAAATCG TAATCTTCAA 
CCAAATCTTC AATACTTGAT ATTTCTCAAA 
65 ATTTCAGATT CACTCAAAAA GGATTTTTCT 
AGTTGTTACC TTTAACAATG TCTCCGAAAC 
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CAATAAAAAC 


GAATGCAGCT 


CTCAGAATTC 


2280 


AGAAAACGCA TTCCGAAACT CTGTGTCATC 


2340 


TATAATCATG 


AGTAATCAAG 


GATGAACACA 


2400 


AATAATATGA AGAGGAAGAA AAACATCTCA 


2460 


AATCTACAAT 


TCCAGTTGTA 


CCTGTAAATA 


2520 


TGCACAGACT 


AGTTAAAACA 


GGATTTAAAG 


2580 


TCTATTCTTG 


GGTAAGACAT 


GAGGATTTAC 


2640 


AAACTTCTCC 


TCCTTTACGC 


ATGTGATCCT 


2700 


AATCGTAGAA 


AATATAGGAA 


CGACAATGAG 


2760 


AGGTCTTGCC 


AGGACGCAGA 


TGAAGGCAAA 


2820 


TTAATTTAGT 


GATTTCTCCT 


AAAACTTATG 


2880 


CCGCGGTACT 


CTGATCTCCT 


AGATGGTCCA 


2940 


CGACCGCCAC 


GACCACGACC 


ACTATAACTA 


3000 


CCATGAGCTA 


GTTCCACCTG 


CAATGGCCAA 


3060 


ACAATACAAA 


GTTTGTGTAA 


AAATTCTGAA 


3120 


CCCATCAAAG 


TCATAACCAT 


CACGGCCATA 


3180 


AAACTAAATC 


ACATATATCA 


CAAAACATTA 


3240 


ATGAACACTC 


AACAACAACG 


AAGAGCCTAA 


3300 


CATGTGGTTG 


CATAAAAAAT 


CTGACCTCGA 


3360 


TCTTCAAATC 


GATTTGAACA 


ACAGGTCCAT 


3420 


TAAGGAAACT 


TCTTCCAAAT 


AAAATCAGAA 


3480 


TTTCACATGC 


AAATCTTGTA 


AAGACATATT 


3540 


TCTGAGTTCT 


TAAATTAGGA 


GAAACGAGTA 


3600 


TCTTTCACGG 


ATATCGCCGG 


GGAAGGTCCC 


3660 


CATTTATTTC 


TTTCCTATAC 


CAAAATCAAA 


3720 


CAAAGAAGCG 


ATCACACACA 


AAAAAAACTC 


3780 


AAGAGGTTTG 


ATTGATCGTC 


TATATATGAA 


3840 


GAGAGAAAAT 


TCCGATGGAA 


GAGAAGAAGA 


3900 


TCGAATACCA 


TAGAGATCAT 


CACGTGAGTC 


3960 


ATTGTGTCTT 


GACCGTATAA 


AATACATTTG 


4020 


TCAAAAGCTC 


TTAAACCCAA 


AAGAACAATT 


4080 


GAACTTGAAA 


ACAACACAGA 


TCCATTCCCA 


4140 


TTTTTCATTT 


TCGCTTTTTG 


TGATCTGGAA 


4200 


ATCTAGAGTC 


ATCACGAAGC 


TCTATTGAAT 


4260 
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CATGCACTTC 


ACAGCTTCTC 


TCATGGCGAC 


CATTTCACCG 


CTCCAAAACC 


CTAGACTCAT 


4320 




CTGACCAACC 


ACCGCAGACC 


AATGGGTTTC 


ACTCCTTTAC TCCCAAACGC CCTTGCTTCT 


4380 


5 


CCGATCGATC 


CACTTCTTTC 


ACCATCGAAG 


CTATGAGCCG 


TCTCTCACTC 


GCCGACGACG 


4440 




ACAATGGAGG 


GAAGACATTA 


TCAGCTTCCA 


ATTACAGCAA 


TAGAGGAAGT 


TTCAGGTTAG 


4500 


10 


TAGCGAGGAA 


GCGGCGGCGG 


CGTAATTCGA 


GATCGGTGTC 


TGGTCGGAGT 


AGTGATCGGA 


4560 


GTGGGACTCG 


GAGATGTTGC 


TCCATTGGTG 


CTCATGGGAC 


TTGTTCGGAT 


TTGCCTTTCG 


4620 




CTGTTGGTAC 


AGATTCAAGT 


GGAGAGCTTT 


TTGGTGAAGC 


GAATTGGGCT 


TCTGATGTGA 


4680 


15 


GTGAGGCGGC 


GAGGAATTCA 


CGGAGAGAGC 


GGCGAGATTC 


TGGTGGAGAG 


AAGGAAGCTT 


4740 




CTGGTGGATT 


TGGATTTGCT 


AATGGAGTTG 


ATCCAATGGG 


GAATGAATCT 


GGGTATGGGA 


4800 


20 


GTGAGCCTGG 


TTACAGAGGT 


GATGCTGAGT 


TTGGCTATGG 


TGATGAATTT 


GATGATGAAG 


4860 


AAGAAGATGT 


CGAGCCATTG 


TTTTGGGGAG 


GTATTAAATT 


CAGAGACTTT 


TTATAGCAAT 


4920 




TGTGTTCCAT 


CTTGAGATTC 


GTGGTTTTTG 


CTATGAAGAT 


TTGGAGATTG 


ATCATCATTG 


4980 


25 


ATTAGATTAA 


AGATGACAAC 


TTTAGTGTTA 


TTTCTTCTGA 


TGAAAATGAG 


TCTGATTTTG 


5040 




CTCTGCTTGT 


CTATTATGGC 


ATTGCCTCAT 


AGGAATTGTC 


AGAAAGTTGT 


CAAATTTTGA 


5100 


30 


TATGTTTAGT 


GATTGGTGAG 


TGTTTTGGAT 


GGAATTGGGT 


TCTTATCATG 


TTAGGTCATT 


5160 


GTCTGAAATG 


GATATGTATG 


TACTTGGTAT 


TTTGATATGT 


TTAGTGATTG 


GTGAGTGTTT 


5220 




TGGATTTGGA 


GCAGATACAG 


ATTCCACAAT 


GGGGATGTCT 


GGTGAGACAA 


ATCTCAGATA 


5280 


35 


GTAAA 












5285 




(2) INFORMATION FOR SEQ ID NO: 2: 










40 


(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 530 base pairs 

( B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 








45 


(ii) MOLECULE TYPE: DNA (genomic) 










(ill) HYPOTHETICAL: 


NO 










50 


(iv) ANTI-SENSE: NO 












(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 








55 


CATTTATTTC 


TTTCCTATAC 


CAAAATCAAA 


ATTGAGATTC 


GAAACGTCAA 


TAGATCGAAA 


60 


CAAAGAAGCG 


ATCACACACA 


AAAAAAACTC 


ATTGGATAAC 


GATTAACCTA 


AGGAAAACTA 


120 




AAGAGGTTTG 


ATTGATCGTC 


TATATATGAA 


CTAAAATTCC 


AGTAACGATT 


CCGATCACCT 


180 


60 


GAGAGAAAAT 


TCCGATGGAA 


GAGAAGAAGA 


AAGGCGAAAA 


TTGAAATCTG 


ACTAGGGCTT 


240 




TCGAATACCA 


TAGAGATCAT 


CACGTGAGTC 


ACGTGACCGA 


CCGGGTACGT 


ATTAAAATAC 


300 


65 


ATTGTGTCTT 


GACCGTATAA 


AATACATTTG 


ACCCGTTTTG 


CAACAAATCG 


TAATCTTCAA 


360 


TCAAAAGCTC 


TTAAACCCAA 


AAGAACAATT 


CCAAATCTTC 


AATACTTGAT 


ATTTCTCAAA 


420 




GAACTTGAAA 


ACAACACAGA 


TCCATTCCCA 


ATTTCAGATT 


CACTCAAAAA 


GGATTTTTCT 


480 
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TTTTTCATTT TCGCTTTTTG TGATCTGGAA AGTTGTTACC TTTAACAATG 530 



(2) INFORMATION FOR SEQ ID NO: 3: 

5 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 530 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
10 (D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 



15 



20 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

CATTGTTAAA GGTAACAACT TTCCAGATCT CAAAAAGCGA AAATGAAAAA AGAAAAATCC 60 

TTTTTGAGTG AATCTGTTAT TGGGAATGGA TCTGTGTTGT TTTCAAGTTC TTTGAGAAAT 120 

25 ATCAAGTATT GAAGATTTGG AATTGTTCTT TTGGGTTTAA GAGCTTTTGA TTGAAGATTA 180 

CGATTTGTTG CAAAACGGGT CAAATGTATT TTATACGGTC AAGACACAAT GTATTTTAAT 240 

^ ACGTACCCGG TCGGTCACGT GACTCACGTG ATGATCTCTA TGGTATTCGA AAGCCCTAGT 300 

CAGATTTCAA TTTTCGCCTT TCTTCTTCTC TTCCATCGGA ATTTTCTCTC AGGTGATCGG 360 

AATCGTTACT GGAATTTTAG TTCATATATA GACGATCAAT CAAACCTCTT TAGTTTTCCT 420 

35 TAGGTTAATC GTTATCCAAT GAGTTTTTTT TGTGTGTGAT CGCTTCTTTG TTTCGATCTA 480 

TTGACGTTTC GAATCTCAAT TTTGATTTTG GTATAGGAAA GAAATAAATG 530 



40 



50 



60 



65 



(2) INFORMATION FOR SEQ ID NO: 4: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 186 base pairs 
45 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: NO 

55 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Oryza sativa 



(vii) IMMEDIATE SOURCE: 
(B) CLONE: RTS-1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

GAGCCGCCCA CCGATGACGG CGCGGTCCGG GTGGCGGCGG GGCTGACGAA GTGCGTGTCC 60 

GGGTGCGGTA GCAAGGTGAC CTCCTGCTTG CTCGGCTGCT ACGGCGGCGG CGGCGGCGCC 120 

GCCGCCGCCG CGACGGCGAT GCCGTTCTGC GTCATCGGCT GCACCAGCGA CGTCTTGTCC 180 
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TGCGCC 186 

5 (2) INFORMATION FOR SEQ ID NO: 5: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 38 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE : DNA (genomic) 
15 (iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 



20 



25 



35 



40 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 
AATAAATGAG CCCGGGTGGT CAGTCCCTTA TGTTACGT 38 

(2) INFORMATION FOR SEQ ID NO: 6: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 38 base pairs 
30 { B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
TAACAATGTC CCCGGGTGGT CAGTCCCTTA TGTTACGT 38 
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CLAIMS: 

1. A substantially pure nucleic acid sequence comprising a 
sequence as set forth in SEQ ID NO: 2. 
5 2. A DNA sequence comprising a 20 base pair nucleotide portion 

identical in sequence to a consecutive 20 base pair portion of the sequence as set forth 
in SEQIDNO:2. 

3. A recombinant expression vector comprising the DNA sequence 

of claim 2 operably linked to a gene. 
10 4. The expression vector of claim 3, further comprising a gene 

encoding a eukaryotic selectable marker. 

5. The expression vector of claim 4, further comprising nucleic 

acid sequences that enable replication of the expression vector in a bacterial host, and a 

gene encoding a bacterial selectable marker. 
15 6. A plant entity consisting essentially of a plant cell, seed or plant 

produced from the in vitro introduction of an exogenous nucleic acid sequence of 

claim 3. 

7. An expression vector comprising the DNA sequence of claim 2 

and a polylinker sequence. 
20 8. The expression vector of claim 7, further comprising a gene 

encoding a eukaryotic selectable marker. 

9. The expression vector of claim 8, further comprising nucleic 

acid sequences that enable replication of the expression vector in a bacterial host, and a 

gene encoding a bacterial selectable marker. 
25 10. A transgenic plant entity comprising plant cells transformed with 

a DNA sequence comprising a regulator/ element operably linked to an exogenous 

gene, said regulatory element comprising a 20 base pair nucleotide portion identical in 

sequence to a consecutive 20 base pair portion of the sequence as set forth in Seq ID 

NO:2. 



30 
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